# OCR enhancement

Trocr Ajami
A model focused on converting image content into text information, with wide application value.
Image-to-Text TensorBoard Other
T
TutlaytAI
139
0
Aya Vision 8b
Aya Vision 8B is an open-weight 8-billion-parameter multilingual vision-language model supporting visual and language tasks in 23 languages.
Image-to-Text Transformers Supports Multiple Languages
A
CohereLabs
29.94k
282
Internvit 300M 448px
MIT
InternViT-300M-448px is an efficient vision foundation model developed through knowledge distillation from InternViT-6B-448px-V1-5, featuring dynamic input resolution of 448×448 and supporting 1 to 40 patch processing.
Text-to-Image Transformers
I
OpenGVLab
7,506
57
Idefics2 8b Chatty
Apache-2.0
Idefics2 is an open multimodal model capable of accepting arbitrary sequences of images and text as input and generating text output. The model can answer questions about images, describe visual content, create stories based on multiple images, or function purely as a language model.
Image-to-Text Transformers English
I
HuggingFaceM4
617
94
Internvit 6B 448px V1 5
MIT
InternViT-6B-448px-V1-5 is a vision foundation model fine-tuned based on InternViT-6B-448px-V1-2, featuring strong robustness, OCR capabilities, and high-resolution processing.
Text-to-Image Transformers
I
OpenGVLab
155
79
Internvit 6B 448px V1 2
MIT
InternViT-6B-448px-V1-2 is a foundational vision model with a feature backbone, comprising 55.4 million parameters, supporting image processing at 448x448 pixels.
Text-to-Image Transformers
I
OpenGVLab
19
27
Donut Base Payslips
MIT
Document understanding model based on Donut architecture, specifically fine-tuned for payslip image processing
Text Recognition Transformers
D
Assadullah
20
0
Trocr Captcha
MIT
This model is an open-source model based on the MIT license, with a CER (Character Error Rate) of 0.0019, indicating high accuracy in specific tasks.
Large Language Model Transformers
T
tomofi
37
5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase